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Abstract. We introduce and define the concept of a stochastic pooling network 
(SPN), as a model for sensor systems where redundancy and two forms of 'noise' 
- lossy compression and randomness — interact in surprising ways. Our approach 
to analyzing SPNs is information theoretic. We define an SPN as a network 
with multiple nodes that each produce noisy and compressed measurements of 
the same information. An SPN must combine all these measurements into a 
single further compressed network output, in a way dictated solely by naturally 
occurring physical properties - i.e. pooling - and yet causes no (or negligible) 
reduction in mutual information. This means SPNs exhibit redundancy reduction 
as an emergent property of pooling. The SPN concept is applicable to examples 
in biological neural coding, nano-electronics, distributed sensor networks, digital 
beamforming arrays, image processing, multiaccess communication networks and 
social networks. In most cases the randomness is assumed to be unavoidably 
present rather than deliberately introduced. We illustrate the central properties of 
SPNs for several case studies, where pooling occurs by summation, including nodes 
that are noisy scalar quantizers, and nodes with conditionally Poisson statistics. 
Other emergent properties of SPNs and some unsolved problems are also briefly 
discussed. 
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1. Introduction and Background 

Two challenges are ubiquitous for many forms of signal and information processing 
tasks, whether in biology, or artificial technology. These are (i) robustness to the 
effects of random noise, and (ii) efficient extraction of only the 'information' relevant 
for achieving some goal. The latter challenge can require lossy compression, where 
some 'information' is intentionally discarded. Robustness to unavoidable random noise 
or fluctuations is often achieved using a network or array of sensors. It is less obvious 
that a network approach can simultaneously achieve both useful lossy compression 
and noise reduction. 

Lossy compression, far from being a limitation, is usually very desirable, because 
the lost information is deemed either redundant or irrelevant [1]. For example, 
when a histogram is formed from real-valued data, information about individual 
measurements is discarded in order to drawn conclusions about general trends. Any 
two measurements that fall within the same histogram bin are treated as equivalent 
even though the original measurements may have been different. As a further example, 
the utility of lossy compression techniques based on what is perceptible by our senses, 
such as the JPEG and MPS standards, is nowadays self-evident. 

The goal of this paper is to introduce and define a concept we call a stochastic 
pooling network (SPN). This name was first proposed in [2] to describe a network of 
sensors where each sensor produces binary measurements of a common information 
source. In the process of pooling, this network simultaneously reduces random 
fluctuations - i.e. noise - via an averaging-like effect, and further compresses the 
data. The signal processing goal in [2] was a binary detection problem. 

The concept described in the current paper evolved from previous work on a 
model [3] that can now be described as a special case of an SPN. That model 
was studied in the context of a form of stochastic resonance [H [51 [S] known as 
suprathreshold stochastic resonance. Nowadays stochastic resonance is a generic 
term that describes any phenomenon where random noise in a nonlinear system can 
provide a signal processing benefit [71 [8] . The simple SPN in [ST exhibits stochastic 
resonance in a much more pronounced way than conventionally is the case, in that 
'noise benefits' occur for suprathreshold signals and do not rely on small input signal- 
to-noise ratios (SNR) ^3^. The same model is well suited for studying the information 
coding properties of populations of parallel neurons [9l [TOl [11] . 

We emphasize that here we aim to define Stochastic Pooling Network in a 
way that extends its scope significantly beyond that of [3l [2]. In particular, our 
definition means the nodes in the network can be far more complex than the simple 
quantizers considered in [31 H] , and in related studies on suprathreshold stochastic 
resonance [H [H [13l [H [15l [16] . 

Here we are also not focussed on suprathreshold stochastic resonance, but 
emphasize instead the following essential features that a system must posses in order 
to fit our definition of an SPN: 

• multiple noise sources - usually unavoidable and uncontrollable - corrupting 
multiple measurements of the same sample of an 'information source'; 

• lossy compression of each noisy measurement; 

• 'pooling' of the multiple noisy and compressed measurements to a single 
measurement that has fewer states than the vector of individual measurements. 
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By pooling, we mean that the muhiple measurements of the 'information source' 
are constrained to being combined in a simple way that is not controllable or reversible, 
and that loses the details of which measurement originated from each measurement 
source. Below we use the term pooling function when mathematically describing the 
precise manner in which measurements are pooled - see Section [3l 

Our motivation for defining an SPN originated from research into two 
fundamental scientific questions about biological neurons: 

(i) What mechanisms do biological neural systems use to compress information about 
external stimuli during transduction at the sensory periphery? 

(ii) Do unpredictable fluctuations in neural activity in sensory transduction processes 
contribute to coding/compression effectiveness? If so, is this achieved in 
conjunction with redundancy? 

The three goals of this paper are to (i) define stochastic pooling network; (ii) 
illustrate that SPNs are a generically useful paradigm for a diverse range of scenarios 
beyond biological neural coding; and (iii) illustrate that SPNs can display surprising 
emergent behaviour that is closely related to their capabilities for noise reduction and 
compression, and are therefore interesting to study in their own right. 

We proceed in Section [2] by qualitatively describing a wide range of systems, 
ranging from the nanoscale to the macroscopic, that illustrate the diversity of the 
SPN model. Next, in Section [3] we state our definition of an SPN. In Section 2] we 
present some simple SPN case studies where we use information theory to explore 
some of the consequences of our definition. Finally, in Section [5] we discuss several 
general emergent phenomena that can be expected to occur in any SPN, list some 
unsolved problems for SPNs, and present some suggestions for further work on the 
topic. 

2. Examples of Stochastic Pooling Networks 

The term node is used to refer to each of N independently random measurements in an 
SPN. Note that in some of our discussion we loosely use the word 'sensor' to describe 
the nodes of an SPN. This may be slightly misleading for some scenarios, such as 
neural populations, where each node is a parallel nerve fibre. In such circumstances, 
the word 'channel' may be more appropriate. 

A simple thought experiment illustrates the SPN principles outlined in Section [1] 
Suppose a psychophysicist asks 100 people to vote 'yes' or 'no' on whether they think 
a sound is 'loud.' After listening to the sound, each person writes down their vote 
and places it in a hat. The convenor then counts the votes for 'yes,' and obtains a 
measurement between and 100. If the experiment is repeated for many randomly 
chosen sound volumes, the vote counts provide noisy and compressed estimates of each 
sound volume. They are compressed because an estimate of an analogue volume is 
quantized to a discrete scale. They are noisy, because repeated presentation of the 
same volume may result in a different count. Other features of this thought experiment 
include (i) multiple noisy measurements of the same information source - each person 
may perceive a sound with different biases; (ii) extremely lossy compression in each 
measurement - each person is forced to vote on a binary scale; (iii) pooling of 
measurements - the convenor doesn't know who votes yes. The examples below outline 
how the same ideas apply to various data acquisition and processing systems. 
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2.1. Artificial Sensor and Communications Networks 

A number of modern sensor and communications networks can be modelled as an 
SPN, mostly due to some constraint on the network topology, such as limited power 
in each sensor, or the simplicity inherent in having multiple identical nodes |17| . An 
abstraction of such a problem in distributed source coding is known as the Chief 
Executive Officer (CEO) problem [T^. However, work in the signal processing 
literature is usually concerned with finding optimal designs for aspects of such 
networks. Our definition of an SPN instead includes a requirement that coding 
and pooling occurs naturally due to physical properties, and we aim to study the 
consequences of this, and whether optimal results can emerge. Distributed detection 
schemes that utilize multiaccess channels [19j is a good example of an artificial network 
where this constraint may be realistic. Indeed, some studies of the latter case [5D] can 
be easily mapped to the binary detection SPN [5]. One of the earliest digital sonar 
beamforming systems, known as the DIMUS sonar array, can also be described in this 
way [21j , as well as more recent sonar systems like the Barra sonobuoy. 

2.2. Biological Neurons 

The emergent properties of the simple network considered in [3j, such as 
suprathreshold stochastic resonance, have been observed for its extension to 
networks of neuron models, including the Fitzhugh-Nagumo [9J, and Hodgkin-Huxley 
models [11]. Suprathreshold stochastic resonance has also been observed when 
replacing additive noise in each node by multiplicative noise [25 . The group of 
cochlear nerve fibres that synapse with a single hair cell in the inner ear, and transduce 
analogue sound waveforms, is an excellent candidate for description as an SPN. This 
is because multiple parallel nerve fibres code the same signal using action potentials, 
but have conditionally independent variability . It is also possible that modulation 
of synaptic neurotransmitters can be described as an SPN [24) . Such questions are 
now being addressed in experiments on living neural networks |25j . 

We expect that studies of SPN models may be useful for the design of future neural 
prosthetics, such as cochlear implants. These are surgically implanted biomedical 
prosthetics that allow profoundly deaf people to hear, via electrical stimulation of the 
cochlear nerve |26j . This electrical stimulation does not replicate the independent 
variability of healthy cochlear nerve fibres, and many nerve fibres are therefore 
redundant. It has been proposed that cochlear implants may be improved by the 
controlled introduction of suprathreshold stochastic resonance [TUl ^7\. The SPN 
model is a useful way of understanding why cochlear nerve fibres are naturally 
independently noisy, and why replicating this in cochlear implant stimulation is 
desirable [28]. Furthermore, the concept of 'pooling' is also relevant to problems 
in brain-machine interfaces where recordings need to be made from neural activity. In 
this context, pooling is similar to what is known as 'aggregation' 

2.3. Analog-to- Digital Converter (ADC) Circuits and Digitized Beamforming 

Continuing advances in digital signal processing technology have led to a trend for 
the ADC in digital communications networks and sensors to be shifted 'as close to 
the antenna as possible.' A simple signal processing task affected by this trend is in- 
band random noise reduction via averaging. If averaging is carried out on sequential 
samples, it is known in radar or sonar as coherent or pre-detection integration. Spatial 
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averaging also results during the process of beamforming. Assuming infinite precision, 
and finite variance noise, coherently averaging N independently noisy observations of 
the same signal of course reduces the noise by a factor of ^/N- 

However, if each measurement has been digitized first, noise is no longer simply 
reduced by this factor, and the problem can be abstracted as an SPN. Performance of 
this 'averaging of digitized signals' depends on the input SNR, the probability 
distributions of both the signal and the noise, and the number of bits of the 
ADC [30l I31j. This scenario is different from dithered ADCs in that more than one 
noise source is assumed in an SPN [5] , although it can be thought of as similar to a 
spatial translation of the low-pass filtering properties of oversampling ADCs [32]. A 
single ADC where each voltage comparator is an independently noisy node may also 
be modelled as an SPN [15l[T6]. 

2.4- Other Candidates for SPN modelling 

The simple binary SPN first considered in [3] has provided inspiration for novel 
reliability schemes in nanoscale electronics 33 . At this scale, SNRs are very small, 
meaning it may be impossible for traditional noise reduction methods to operate. 
New methods are required [34], including the possibility of using SPN- like arrays to 
reduce noise [33]. SPNs may also be useful in studies of complex social networks. The 
subjective voting example given at the start of this section is a very simple case, and 
could be extended in numerous directions. We also suggest that any other situation 
where compressed measurements are averaged, such as some image processing tasks, 
may be suitable for studying as an SPN. 

3. Stochastic Pooling Networks: Definition 

For the current paper we define an SPN as a network without any feedback, co- 
operation, adaptation, or side-information, although we believe such features could be 
introduced without altering the basic concept. Otherwise, the following definition is, 
by design, very general. In Section |4| we consider specific examples that illustrate the 
interesting features of SPNs, and for which we can quantify performance. 

3.1. Essential Features 

We begin by expanding on the three features asserted in Section [Tj to be essential 
elements of an SPN. 

Multiple sensors make stochastic observations of a common signal: The sig- 
nal being measured may be any information source^, and in general may be either 
scalar or vector. By stochastic observations we mean that each measurement is 
a random variable when conditioned on the common signal. This stochasticity 
may arise either externally - e.g. additive or multiplicative random noise - or 
internally in a node. Each node's output measurement may be either condition- 
ally independent of each other, or correlated, but not perfectly correlated as then 
those nodes could be treated as a single node. 

I We use the term 'information source' qualitatively, but intend it to refer to some parameter that 
it is desirable to measure, and that can be modeled as a random variable. 
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Each sensor communicates compressed measurements: The input to a node 
may be the information source itself, or a mixture of the information source and 
random noise. Each node in the network is required to carry out lossy compression 
on its input, meaning the node output is a discrete variable with less states than 
the input, and the original input cannot be exactly recovered from the output. It 
is permissible for different compression operations in each node. 

Measurements are combined by pooling: The outputs of each node are com- 
bined at a 'fusion hub' to form an overall network output consisting of a single 
observable measurement of the information source. This combining is charac- 
terised by a pooling function, i.e. an abstraction that accounts for all corruption 
or processing of the vector of the outputs from an SPN's individual nodes. 

Note that we have not yet fully defined some restrictions on the form of pooling function 
that makes an SPN interesting to study - we will address this below. 

The first two features listed above are deliberately very general, and it is not 
particularly useful to define the nodes of an SPN solely in such terms. For example, 
we have referred to individual node outputs as not being 'perfectly correlated.' While 
in many cases we would like to assume that the conditional outputs of all nodes are 
independent, we do not want to exclude the possibility of correlation. We also use 
the general term 'lossy compression,' which could mean the application of a complex 
algorithm. However, it is more likely that very simple compression, e.g. signal 
quantization such as occurs in an analog-to-digital converter circuit, is required to 
fit the definition of 'pooling.' Hence, particular case studies of SPNs should carefully 
define the properties of individual network nodes, since although the first two general 
properties are necessary conditions, the precise nature of the nodes is less important 
to the overall concept of an SPN. Instead, the primary property that needs careful 
definition is the third one, i.e. the nature of the pooling function. 

3.2. The Pooling Function 

The pooling function can be thought of as a measurement 'fusion hub.' In signal 
processing applications, it is usually assumed that fusion algorithms are open to 
engineered design, and that the whole toolkit of optimal filtering and compression 
algorithms is available for the design. 

Is this the case for an SPN? Our answer to this question is no, and we wish to 
embed this assertion in our definition of an SPN, and more specifically, in a definition 
of what it means to 'pool' in an SPN. This approach is due to the motivation. We are 
not interested here in designing a good or optimal network as a whole, or in designing 
good or optimal fusion and compression schemes. Techniques for achieving such tasks 
abound in the modern communications engineering and signal processing literature 
and practise. 

Instead, we want to address the question of whether good or optimal fusion can 
emerge naturally in physical and biological systems. That is, can networks naturally 
combine - compressively - correlated measurements in a way that does not lose any 
information or loses negligible information? We discuss more precisely what we mean 
by 'loss of information' in Section 13.31 This motivation means we are interested in 
modelling the part of the SPN between the points where (i) each node produces a 
compressed measurement, and (ii) the whole network produces its final output, as a 
channel. That is, this pathway is an environment governed by physical properties and 
constraints that cannot be controlled by external intervention or adaptation. This 



Stochastic Pooling Networks 



7 



means we can define 'pooling ' in terms of properties of a physical channel. It is this 
notion that most contributes to making an SPN, as we have defined it, a genericaUy 
useful and interesting model. 

This leads us to at last state a definition of a pooling network. 

A stochastic pooling network is a network with the property that 
multiple parallel stochastic and compressed measurements of a 
common input signal are combined into a single measurement by 
a physical channel, in such a way that pooling of measurements 
causes no (or negligible) further loss of information about the 
network's input signal, when compared with the best performance 
that could be achieved if all sensor measurements were available. 

Note that if a network exists such that pooling is highly lossy, then it is excluded from 
our definition. 

Our intention with this definition is to move the emphasis from the precise 
properties of individual nodes onto the properties of the whole network. This does 
create a difficulty in precisely defining the pooling function - i.e., what class of 
functions should be specifically excluded, and what functions are 'simple' enough to 
emerge naturally from physical properties? We therefore do not propose such a class 
of functions here, but use the example of pooling by summation in Section |4l We can 
envisage also pooling functions being the maximum or minimum measurement, or the 
majority vote, and - as discussed in Section [3^ - any sufficient statistic created from 
the individual measurements. However, we have yet to fully explore the possibilities. 
Provided a network where the pooling function emerges from physical properties rather 
than design, and meets the 'no or negligible loss of information' property, then it may 
be an SPN according to our definition. 

3.3. Discussion on Information Loss 

So far we have used the term 'information' in a qualitative sense only. To concretely 
define the 'nodes are lossy compressors' and 'pooling with no or negligible information 
loss' properties, it is necessary to state the technical sense in which we mean 
'information.' There are several possible ways of analyzing the performance of an SPN, 
depending on the signal processing goal of the network. Here we use Shannon mutual 
information j35j , although other definitions of information, such as Fisher information, 
and distance measures for discrimination problems, may be equally appropriate |36| . 
The mutual information between two random variables a and (3 is denoted as /(a; /3), 
and has units of bits per sample, or bits per channel use, depending on the context. 

Suppose an SPN has N nodes, and the information source is the random variable 
X, which may be either discrete or continuously valued. We denote the output of 
each individual node as the discrete random variables yi, i = 1,..,N, where yi has 
cardinality Mj, and the overall output of the SPN as the discrete random variable y, 
with cardinality K. Without loss of generality, we label each state of yi by the integers 
0, .., Mi — 1 and each state of y by the integers 0, .., K — 1. Figure [1] shows a block 
diagram of an SPN as we have defined it. 

3.3.1. Compressive Nodes Even though in some cases the stochasticity of a node 
may be completely internal, for the general formulation we consider that each node's 
measurements can be corrupted by external noise, rji, and recognize that it may happen 
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that r]i is always zero. We denote the input to each node as a function of the signal 
and noise, Wi = fi{x, rji). Each node operates on this noisy input to obtain a discretely 
valued measurement j/^, according to a conditional probability mass function (PMF) 

Pyi\wiiyi ^ u\wi) ^ gi{u,Wi) u e 0, .., A/,; - 1. (1) 

In this paper we define w as a continuous random variable. This is sufficient to 
guarantee the lossy compression property we require of an SPN, since Mi is a discrete 
random variable. We note that it may be valid for w to be a discrete random variable, 
but do not consider this in the current paper. In terms of mutual information, it 
is always true that I(x;wi) > I{x;yi), since otherwise the data processing inequality 
is violated - see, e.g., Theorem 2.8.1 in [35], and [37]. If a; is a continuous random 
variable the inequality is always strict, since yi is discrete. 

In some of the examples in Section [21 it may appear that the output of a node 
is not a discrete random variable. However we are constructing an abstraction of 
the true physical properties that takes into account only the macroscopic properties 
that impact on the production of an output measurement from the pooling hub. For 
example, in the case of a neuron population, the only property of a neuron that codes 
information is whether it produces an action potential at a certain time. Hence, the 
fact that the output voltage is in reality an analogue variable that may be subject to 
fluctuations, is not material. Random fluctuations in this variable that may impact 
on the pooling hub's output can be incorporated into the PMF model of each node. 

3.3.2. No or negligible information loss We denote the total number of possible 
states in the vector of individual node outputs as Mg := Jl^i Mi. Without further 
compression or compaction, this vector can be represented using Bg = log2 {Mi) 

bits. We are interested in networks for which the pooling function, y := h{yi, 7/2, ••, TJn), 
is such that the following two properties hold. 

(i) The pooling function h{-) is such K < Mg, so that the number of bits required to 
represent y without further compression or compaction is By — log2 (K) < Bg. 
Without this property, the network cannot be said to 'pool.' 

(ii) The mutual information between the information source and the vector of 
observations is either equal to that between the source and the network output, 

I{x;yi,y2,:,yN) = I{x;y), (2) 

or is such that 

I{x;yi,y2,..,yN)^I{x;y)+e, (3) 

where e is positive and small compared to /(x; y). This property states what we 
mean by 'no or negligible loss of information.' 

These two properties together mean that the pooling function combines TV 
measurements in a way that reduces the number of raw bits required to code the 
measurements, but without incurring the cost of reduced mutual information. 

We remark that Eq. ^ does not mean 'lossless compression' in the sense that 
the term is usually understood. In computer science and information theory, 'lossless 
compression' describes compression that is deterministically reversible, such as file 
compression schemes, or Huffman coding [35'. Data is coded in such a way that it can 
be stored and transmitted using less bits than the original data but later recovered 
exactly by an inverse operation. When Eq. ([2]) holds in an SPN, the vector {yi, .., ?/jv) 
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cannot be recovered from y, and yet there is no reduction in mutual information. This 
is known as redundancy reduction, or data compaction, rather than lossy compression. 
On the other hand, if Eq. ([3]) holds, the pooling function must be a lossy compressor. 

Clearly, if the pooling function could be designed or controlled, the fact that 
it might compress or compact the network's measurements is not remarkable. We 
reiterate that what we are interested in modelling with an SPN is the situation where 
measurements are combined by 'pooling,' so that compression/compaction from Bg to 
By bits, with little reduction in mutual information, emerges solely from the physical 
properties and constraints of the system. Although it could be desirable that the 
pooling function was truly a lossless compressor, this constraint means that the pooling 
function will almost certainly not be a complicated algorithm. This excludes any 
networks where the pooling hub is capable of carrying out reversible lossy compression 
algorithms like Huffman coding, or more sophisticated data compaction algorithms. 

3.4- Discussion on Pooling Functions and Sufficient Statistics 

A useful perspective is to note that Eq. ([2]) is equivalent to stating that y = h{yi, .-TyN) 
is a sufficient statistic for the observations yi, .., j/at [351 136j . If pooling in an SPN is 
lossless, then by definition the pooling function must be a sufficient statistic created 
from the vector of measurements from individual nodes - see Case Study I in Section[51 
On the other hand, if the loss is negligible, as in Eq. ([3]), we cannot say 
anything definitive with regard to sufficient statistics. Nevertheless, allowing the 
loss of negligible information is an appealing notion, because in many circumstances 
small losses are acceptable. In Case Studies 2 and 3 in Section [4l we use numerical 
methods to show that particular pooling functions that are not sufficient statistics 
incur negligible loss of mutual information, meaning the studied networks are SPNs. 

4. Case Studies 

We make the following assumptions for the three case studies presented in this section: 

• is a scalar input, scalar output function; 

• the cardinality of yt is the same for all nodes, i.e. Mi = M \/ i = 1, .., N; 

• the information source, x, is a discrete time stationary random signal, with no 
memory, i.e. each sample is with PDF fx{-), and support S; 

• all analysis is for a pooling function that sums the individual measurements, i.e. 
y = h{yi, ..,yN) ^ 

• except for the Poisson nodes in Case Study I , the stochasticity of each node is due 
to external continuously valued iid additive random noise, rji, that is uncorrelated 
with the signal, i.e. Wi — x + rn; 

Illustrative example plots of the mutual information, as a function of input SNR, are 
presented after discussing each case. In these plots, the input SNR is that at the input 
to an individual SPN node for Gaussian noise, apart from the Poisson case, where 
input SNR is undefined, and so plots are scaled relative to the mean of the Bernoulli 
case. It is straightforward to calculate mutual information by numerical integration 
for any given distribution of a;, provided the conditional output distribution is known. 
Details can be found, for example, in [T3]. In all plots, the signal is assumed to be 
Gaussian. The choice of Gaussian signal and noise is arbitrary. Other finite variance 
distributions do not lead to qualitatively different results. 
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4.I. Case Study 1: Identical 'Poisson-like' nodes 

Here we discuss a class of SPNs where a pooling function that sums the vector of node 
outputs always results in a sufficient statistic for that vector, and hence summation 
satisfies Eq. This case study assumes all network nodes have identical PMFs 

conditioned on their inputs, i.e. we let gi{wi,u) = g{wi,u) V i = 1, ..,-/V in Eq. ([T]), 
and denote the cardinality of yi as Mi — M V i. We remark that although the 
following analysis applies to continuously valued yi as well, any such network is not 
an SPN according to our definition. It is possible that future generalizations of the 
SPN concept may relax the condition of discretely valued yi. 

We are interested in the case when the conditional PMF of each node's output, 
conditioned on the input signal, x, is from an exponential family, with a linear sufficient 
statistic 36 . This means functions a{x), b{x) and Z{x) exist such that we can write 

When written in this form, Z(x) is known as the partition function, which is very 
important in statistical physics, since most thermodynamical quantities may be found 
from it and its derivatives. The linearity of the sufficient statistic for individual nodes 
leads to a simple pooling function. Since each of the N SPN nodes are identical, their 
outputs are conditionally iid, and their joint conditional PMF can be written as 

Py,,...,y^\Aui,---,UN\x) = . (5) 

where y = "^f^iUi. It is clear that this joint PMF is also from an exponential 
distribution, with y as a sufficient statistic for (yi, ..,?/Ar). 

We now discuss two examples of SPNs where the nodes can be described in the 
form of Eq. ([5]), and hence pooling by summation satisfies Eq. First, suppose each 
node is an identical binary quantizer that compares its input to a threshold level, 9, 
so that yi € {0,1} V i. Each node is defined by the PMF 

/ \ 7-. /i\fw Wi > 9 
g(u,w^) Py^\^^{u\w^) = < ^ w,<9 w = 0, 1. 

Suppose also that the input to each node is Wi — x ^rji, i.e. the information source is 
subject to iid additive random noise. Upon letting p{x) = 1 — -Fi,(6' — x), where 
is the cumulative distribution function of the noise, the conditional PMF for given x 
is a Bernoulli distribution with parameter p{x) and can be written as 

Py,\M^) = pixni - Pix) f-^ u e {0, 1}. 

The Bernoulli distribution is known to be an exponential family, and can be rewritten 
in the form of Eq. ^ by letting a{u) = 1, b{x) = log (^j^^), and Z{x) = i_li^^y 

As a second example we consider inherently stochastic nodes that are 
conditionally Poisson, with rate X{wi). The conditional PMF of node i is 



Py.\^^{u\w^) ^ j W = 0, 1 



If there is no external noise, so that Wi = x ^ i, the conditional PMF given the input 
is again an exponential family with a linear sufficient statistic, and can be written in 
the form of Eq. (jH) by letting a{u) ~ 1/u!, b{x) — log (A(a;)) and Z{x) = cxp (A(a;)). 
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If every node in an SPN has its conditional PMF g{-) described by either of these 
examples, then it is clear that if the pooling function is y = Si^i^ii then pooling 
satisfies Eq. since y is a sufficient statistic for {yi, ..,?/jv)- Figure 2(a) shows the 
behaviour of the mutual information with increasing input SNR for these examples, 
in comparison with the mutual information of an example of Case Study 3 below. 

Naturally occurring summation of parallel measurements can occur for either 
example. The binary quantization case occurs in distributed sensor networks that use 
multiaccess channels [20], while the Poisson distribution is widely used to model the 
production and pooling of action potentials in biological neurons [35| . We have shown 
that efficient processing of information is an emergent property of such networks. 

We remark that it has recently been argued that the summation of the output of 
two populations of neurons encoding information about one stimulus is Bayes optimal 
for the estimation of the stimulus, if the likelihood of the population is 'Poisson- 
like' [39]. 'Poisson like' means that the likelihood is an exponential family with linear 
sufficient statistics. This is exactly the case in the examples described here. 



4-2. Case Study 2: Identical M-ary Quantizing Nodes and iid Additive Noise 

We now consider a case where the SPN models the 'averaging of digitized signals' 
scenario outlined in Section [S] We show that while a summing pooling function is not 
a sufficient statistic, the reduction in mutual information after pooling by summation 
is negligible, that is, Eq. ([3]) is satisfied. 

Here each network node represents an analog-to-digital converter circuit. Each 
digital sample is assumed to be of an independently noisy version of the same signal 
sample. We model each circuit as an identical Af-ary quantizer that is defined by 
M — 1 threshold levels, (6'i, .., 9m-i)- This means we let gi{-) = g{-) V i. The vector of 
individual node outputs has Mg — possible states, but is reduced on pooling to 
K = N{M — 1) + 1 states, as shown in Fig. [3l This example is of particular interest in 
the context of an SPN being a network with some redundancy, because in the absence 
of any noise the network has maximum redundancy - i.e. rji = \/ i - and is not an 
SPN. 

It is not useful to write each node in the form of Eq. ([1]), but upon defining 
^0 := —oo and 9m :— oo we can write 

Vi ^ u 0u<Wt<9u+i M = 0, ..,Af-l. 

For an SPN with N such nodes, each node can be thought of as providing a single iid 
multinomial trial. There are N trials in total, each of which can produce any of M 
outcomes. The probabilities of each outcome are 

Py^^^{u\x)^l F^iOu+i - x) ~ E^iOu ~ x) u^l,..,N~l, 
[ l-Fr,{9N~x) U = M. 

Just as for the binary quantizer scenario in Case Study 1, it is well known that the 
PMF of a multinomial trial is from an exponential family, as is the result of N iid 
multinomial trials. A sufficient statistic for the outcome of N multinomial trials is 
the count of how many times each outcome occurs, which can be written as an M — 1 
dimensional vector, since the sum of the counts must add to N. If this sufficient 
statistic is formed, the distribution of the sufficient statistic is given by the multinomial 
distribution. This distribution has states. 
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In the binary pooling function that forms the sum y — Vi sufficient, 

since it is equivalent to a count of the number of Is. However for Af > 2, a pooling 
function that forms the counts of each of M outcomes does not occur naturally in 
scenarios like the 'averaging of digitized signals,' case described in Section O Of 
course, there are often many sufficient statistics, and one that may be useful in the 
context of this SPN is stated in the form of the following theorem. 

Theorem 1. Let Sj = X^i^i Vi- Then S :— {Si, 5*2, .., Sm-i) is a sufficient statistic 
for {yi,y2, ■■,yN)- Hence, /(x; (yi, j/2, •■, yw)) = I{x; (Si, S2, ■■, Sm-i)) > I{x;y). 

Proof. Let qm{u) be an {M — l)-th order polynomial with roots being the integers 
between and M —1, except for m, and such that qm{m) = 1. This polynomial exists, 

and can be written as qm{u) = a™ nj=oj#m(^~i) where = {Y[jJoj^mi'^~j)y^ ■ 
This set of polynomials is called the Lagrange basis, and we can write 

M-l M-1 

Pyi\x{u\x) = Jl Pm(a;)'''"^"^ =exp(^ g„j(u) log (p„j(a;))). 

m— m— 

It is clear that we have a multiparameter exponential family, with an M—1 dimensional 
sufficient statistic, the vector (qQ{u),qi{u), ..,qM-i{u)). However, it is possible to 
rewrite the PMF in the form 

Af-l 

Pyi\x{u\x) = exp(^ b^{x)u"'), 

rn—0 

and the vector m^, .., is also a sufficient statistic. Similar to the binomial 

case in Case Study 1, if there are N iid multinomial trials, their joint distribution 
can also be written as an M — 1 parameter exponential family, where the vector of 
sufficient statistics is as stated in Theorem 1, after defining Sj = X^ili Vi- ^ 

So, provided that N > the sufficient statistic S reduces the number of 

measurements without incurring a loss in mutual information, since Bg = iVlog2 (M), 
while the raw number of bits required to code S is Bs — {M — 1) log2 {N + 1). We 
note that this is far less efficient than the count of the frequency of each state. 

Like the state counts, the sufficient statistic S may not be a naturally occurring 
pooling function in an SPN. However, it is feasible that each node's output could be 
raised to a power before summation. So we may naturally be interested in the case 
where the pooling function is simply y = Sj for any integer j. Obviously the case of 
j = I corresponds to the 'averaging of digitized signals model.' Verification that this 
pooling function satisfies Eq. ^ is given by Figured The double peak in Figure [4 (b)| 
is intriguing. It indicates that there is an optimal input SNR close to 10 decibels 
where the pooling loss approaches zero, for all N considered. Further investigation of 
this is left for future work. 

4-3. Case Study 3: Non-identical binary- quantizing nodes 

In Case Studies 1 and 2 we considered only the case where all nodes were identical. 
This allowed us to write their conditional PMFs as exponential families, and find 
sufficient statistics for the vector of N measurements. However, in general, if the 
network nodes do not all have the same PMF it is not possible to write the joint PMF 
as an exponential family, and it is difficult to find sufficient statistics. Nevertheless, 
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the traces in Figure [2 (b) | demonstrate numerically that pooling by summation can still 
lead to a negligible reduction in mutual information. 

For this case, the small improvement in mutual information that can be had if 
there is no pooling relies on an assumption that the ordered vector of nodes is available 
at a 'decoder.' In other words, the measurements from each node must somehow be 
labelled. It is more natural in the context of SPNs to assume that measurements 
are combined in a way such that any such labelling is lost. The thought experiment 
at the start of Section [2] is an example where there is no labelling. In such a case, 
there is no way to match events to nodes, and counting the occurrences of each event 
becomes a sufficient statistic. Under such an assumption for Case Study 3, pooling 
by summation loses no information. We have previously refereed to this as a no 
labelling property [40] . 



5. Emergent Properties and Unsolved Problems on SPNs 

In Section[l]we stated that SPNs can display some surprising emergent behaviour. One 
of these is suprathreshold stochastic resonance, which can be seen for Case Studies 1 



and 2 in Figures 2(a) and 4(a) in agreement with [3l[3T]. Starting with tiny SNRs, the 
mutual information increases gradually from zero with increasing input SNR, reaches 
its peak at an optimal SNR near decibels, and then decreases as the SNR continues 
to increase, that is, suprathreshold stochastic resonance occurs. Identical nodes are 
redundant in the absence of noise, and the mutual information can be no larger than 
that of a single node. However, when independent noise is present at the input to each 
node, all nodes contribute to coding the input signal. Noise can be said to provide a 
benefit relative to the absence of noise. In a sense, suprathreshold stochastic resonance 
occurs because random noise improves sub-optimal compression. However, in many 
examples in Section [2l there may be no other way to improve on compression of the 
network as a whole, and allowing an optimal noise level is the only way to enhance 
performance. 

Other emergent behaviour that might not be predicted from the definition of an 
SPN have been demonstrated for the binary node case. These include the following. 

• If the nodes in Case Study 3 are optimized for the network as a whole, for small 
SNRs Case Study 1 is optimal, i.e. all nodes are identical [41]. For intermediate 
SNRs, it is optimal for clusters of nodes to be identical, with more unique nodes 
as the SNR increases. This occurs in a series of bifurcations [HI [5]. 

• The mutual information of very noisy SPNs approaches that of analogue Gaussian 
channels, i.e. I{x, y) < 0.51og2 (1 + A^SNR) while near noiseless SPNs are limited 
by quantization, I{x, y) < log2 (1 + N{M - 1)) [40] . 

• Very large SPNs (i.e. N — > oo) behave like multiplicative noise channels, and 
optimal reconstruction of the information source depends only on the noise 
distribution and N [T4] . 

• Optimizing the noise distribution in an SPN is like optimizing a neuron's stimulus- 
response curve [Mj [42] . 

As stated in Section [1] the aim of this paper is to define SPN, and to illustrate 
why SPNs are interesting and generically useful in a number of scenarios. There are 
many areas which remain unexplored, and we simply list some quite general unsolved 
questions that we believe would be useful to address. 
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• In what scenarios can naturally occurring pooling functions be sufficient statistics, 
or lead to negligible reduction in mutual information? 

• Can mathematical approaches predict the clustering of nodes seen in [ITl [5]? 

• Can our SPN definition be extended to apply to networks with more complex 
topologies, incorporating aspects like feedback loops, adaptation, and cooperation 
between nodes, while maintaining the essential qualitative properties? 

• Can studies of SPNs inspire new designs for electronic systems, similar to [33]? 

• Can the features and emergent properties of SPNs be observed to occur in-vivo 
in biological neurons? 

We hope that presentation of these unsolved problems will stimulate further work 
and debate on SPNs in the areas of statistical physics, biophysics and electronics 
design, and that our definition of stochastic pooling network will eventually evolve and 
be refined. In summary, the SPN concept may be appropriate for networks where 
redundancy is useful for achieving noise reduction and simplicity, and where lossy 
compression is required for efficiency. 
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Figure 1. An SPN consists of N nodes, eacli of which operates on noisy 
versions of the same sample from an information source, the random variable, 
X. Each node's output is stochastic, and is defined for a given signal sample by 
a conditional probability distribution. This is indicated by N random variables, 
r]i, i = 1, .., A'^. It is permissible for r]i to be correlated across nodes. The output 
of the j-th node is an Mi state discrete random variable, , yi, and if x is continuous 
this means each node is lossy. The overall network 'pools' the outputs from each 
node, in a channel governed by physical properties, to provide an overall network 
output, y, with K states, and is compressed relative to the vector of node outputs. 




Figure 2. (a) Mutual information for SPNs with (i) additive noise and identical 
binary [M = 2) quantizing nodes (black traces); (ii) identical Poisson nodes 
(green traces); and (iii) non-identical noisy binary quantizing nodes (red traces). 
The Poisson case shown is not plotted against SNR, but is scaled such that the 
expected value of y is the same as for the binary quantizing case at each SNR. This 
is achieved by X{x) = p{x). (b) Difference between mutual information before and 
after pooling by summation, for Case Study 3. The plots for Case Study 3 are 
for threshold values chosen to optimize the mutual information in the absence of 
any external noise. 
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Figure 3. SPN for Case Study 2. Each of A'^ nodes is an M-ary quantizer. 
The input to each node is a common random signal corrupted by iid additive 
noise. The pooling function simply sums the outputs from each node, to result 
in a = (M — 1)N + 1-state discrete random variable. This SPN models the 
averaging of digitized signals, as discussed in Section |2] 




Figure 4. (a) Mutual information for an SPN with additive noise and A' identical 
trinary (M = 3) quantizing nodes, that each have the same threshold levels 8i 
and 02. (b) Difference between mutual information before and after pooling by 
summation. 



